Current Developments of STO - the Danish Lexicon Project for NLP and HLT Applications
نویسنده
چکیده
The Centre for Language Technology (Center for Sprogteknologi, CST) is in charge of a national project developing a large-scale Danish lexicon for HLT and NLP applications. The short name of the project is STO, which stands for SprogTegnologisk Ordbase (Lexical Database for Language Technology). The project is inspired by principles and methods applied in the multilingual LEPAROLE project (1996-98) the aim of which was to develop harmonised written language resources for 12 EU languages. The Danish PAROLE lexicon was produced by CST and the STO project highly benefits from the experience acquired from the work mentioned. This paper deals with a few central tasks of the ongoing project. It discusses the development of a smaller lexical resource produced in a multilingual environment into a large-scale, monolingual resource. Two different methods of increasing the vocabulary will be presented in detail; the extension of the linguistic coverage and the refinement of the linguistic description by including more detailed language-specific information. Finally, some exploitation perspectives and the development of an internet-based user-interface will be presented. The STO project gets funding from the Danish Ministry for Science, Technology and Development for a period of three
منابع مشابه
STO: A Danish Lexicon Resource - Ready for Applications
$EVWUDFWW This paper deals with the STO lexicon, the most comprehensive computational lexicon of Danish developed for NLP/HLT applications, which is now ready for use. Danish was one of the 12 EU-languages participating in the LE-PAROLE and SIMPLE projects; therefore it was obvious to continue this work building on our experience obtained from these projects. The material for Danish produced wi...
متن کاملA Corpus-based Syntactic Lexicon for Adverbs
A word class often neglected in the field of NLP resources, namely adverbs, has lately been described in a computational lexicon produced at CST as one of the results of a Ph.D.-project. The adverb lexicon, which is integrated in the Danish STO lexicon, gives detailed syntactic information on the type of modification and position, as well as on other syntactic properties of approx 800 Danish ad...
متن کاملLemma selection in domain specific computational lexica - some specific problems
This paper describes the lemma selection process of a Danish computational lexicon, the STO project, for domain specific language and focuses on some specific problems encountered during the lemma selection process. After a short introduction to the STO project and an explanation of why the lemmas are selected from a corpus and not chosen from existing dictionaries, the lemma selection process ...
متن کاملConverting Unicode Lexicon and Lexical Tools for ASCII NLP Applications
The NLP SPECIALIST Lexicon and Lexical Tools, distributed by National Library of Medicine (NLM), have been released in Unicode (UTF-8) format since 2006. Lexicon is used as corpus while Lexical Tools are used as software packages in NLP (Natural Language Processing) projects. Some NLP projects still only deal with ASCII (7-bit) characters. This paper describes how to convert UTF-8 Lexicon and i...
متن کاملTowards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon
We describe our attempts to formulate a pragmatic definition and a partial typology of the lexical category of ’collocation’ taking both lexicographical and computational aspects into consideration. This provides a suitable basis for encoding collocations in an NLP-lexicon. Further, this paper explains the principles of an operational encoding strategy which is applied to a core section of the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002